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ABSTRACT 



This pilot Study on the subject access problems of 
patrons of small to medium size libraries was designed to measure the 
extent to which users' vocabularies matched the search vocabulary of 
bibliographic records in the card catalog, and to enhance subject 
access by develooping a microcomputer system which integrated Libra 
of Congress Subject Headings (LCSH) with the natural language of the 
rs. Three public libraries in Virginia were selected as test Site 



ry 



use 



because their users encompassed a heterogeneous population that cut 
across demographic and socioecdnomic indicators. Data collected on 
exception cases (times when users asked librarians for assistance in 
finding information) revealed that of the 412 questions analyzed, 
almost 60% produced no match between the language of the information 
seekers and the information organizers. The online system which was 
developed allows directed browsing and puts descriptors into context. 
It contains eight fields: Subject, the primary access point 
containing the users' vocabulary; Enter, which serves a 
cross-referencing function; LC, a selection of relevant Library of 
Congress subject headings; Broader, Narrower, and Related, three 
fields which stpply hierarchically oriented connections; Scope Notes, 
^hich define the meaning of records in the Subject field; and 
Bibliography, which lists other pertinent holdings. (Author/THC) 
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ABSTRACT 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



ess 



A pilot study, recently conducted in Virginia, concentrated 
on the heretofore virtually ignored subject access problems of 
?Se small to medium size library. The research 
1) TO measure the extent users' vocabularies matched t^fj^arch 
vocabularies of bibliographic records found in ^^^f^^^J^^^t 
card catalog, and 2) To enhance subject access by developing a 
microcomputer driven system which integrated Library of Congre 
subject Headings (LCSH) with the natural language of users. 

Three public libraried - Pittsylvania County, Roanoke County 
and Danville - served as the test sites. Public libraries were 
selected, since their users encompass a ^^^^^'^.^^^^^.1^^''^^^^ 
that cuts across demographic and socio-economic indicators. Data 
collected on exception cases, that is, times when ^^ers asked 
librarians for assistance in finding information, revealed that 
of the 412 questions analyzed, close to 60 percent produced no 
match between the language of the information seekers and ti.e 
information organizers. The exception cases were identified for 
enhanced access, since they recorded subject areas useful to 
cCrrent library clientele but difficult for them to retrieve. 

The online system developed allows directed browsing and 
puts descriptors into context. It contains eight fields. The 
subject field, the primary access point, contains the u^er s 
vocabulary. The Enter field serves a cross referencing function. 
Broader, Narrower and Related field supply hierarchically 
oriented connections. The Scope Note defines the range and 
meaning of , the subject. Other pertinent holdings are listed in 
the Bibliography field. The system, which permits truncated 
searching, is available at low cost. 
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INTRODUCTION 

The myth of the known item search was put to rest in a study 
of online catalog use conducted by the Council on Library 
Resources. Through the investigation it was determined that the, 
-major portion of users' searches were s^^D^f j^^^^^^'^^f ^'^^ . 
explorations for unknown items.^ The June, 1982 meeting devoted 
to the topic, sponsored by the Council on Library Resources, 
added further credence to the fact that subject access demands 
consideration as on6 of the jajor issues facing libraries of all 
types now and in the future^ 

TO date, attempts to' resolve problems in subject access have, 
focused on the large research institutions with minimal attention 
given the small to medium size organization. This is not 
surprising, since the early computer technology jva^able to 
address subject access was limited to large ^mainf rates systems or 
minicomputers often beyond the financial reach of th^ s^^^^^f^^ 
library. With the advent of the microcomputer new opportunities 
are available to facilitate subject searches at low cost. Among 
them is the ability to develop systems capable of provjda^ng a 
mediatinq link between the natural language of information 
seekers and ?h/^precordinate controlled vocabularies employed by 
information orgs^gia^w&s. , . 

THE PROBLEM 

prior to attempting to develop a system capable of enhancing 
subject access in the small to medium sized l^^ra^, was 
necessary to ascertain the level of access currently afforded 
Ssers through searches conducted via the Library of Congress 
subject Headings (LCSH) found in the traditional card catalog. 
?ntu!tive contentions that users' search terms frequently have 
nothing in common with LCSH surfaced repeatedly m the 
p^oJeSlional literature. Attempts to locate 

contentions, however, led to the conclusion that very ^^ttle had 
been reported concerning the operation of the ^^'f^^^^'^ g^^^'^^^^ 
this regard. Within the public library sphere alone, 82 percent, 
or more than 12,000 of tKe total 14,831 libraries, serve , 
^opSLtions und;r 25,000.^ When small to 

and school libraries are added, the number is further inflated 
and ?Se lack of reliable information on which to develop systems 
Ifrones more ironic.^ In fact, it is apparent that the problems 
of subject access in the smaller library are virtually unknown 
beyond those information professionals directly involved with 
them. 

CREATING THE FRAMEWORK 

The ongoing objectives of the work reported here have been to: (1) 
Identify and measure the extent users' vocabularies match the 
selrch vocabularies of the bibliographic ^^^^ ...^ 

catalogs, (2> Create a retrieval ^J^^,^^ /"^^.f Ceveio^^^ 

with users' natural language search vocabularies, (3) Develop a 
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low-cost, online system to enhance subject access, and (4) to 
compare the success rate of searches done on the traditional 
system with that of thi augmented system. One to three have been 
completed in the pilot study reported on here. Pour remains for 
the future when there is sufficient information in and on the 
system to make such a test valid. 

For the purpose of this investigation, subject access was 
defined according to Butler as "the set of processes and_ 
techniques used in the representation of a work so that its 
contents may become known to one desiring the information therein 
without prior knowledge oj the existence of the work, its 
authorship, or location."^ 

To determine the extent of match between users' vocabularies 
and the LCSH search vocabulary, exception cases were recorded and 
analyzed. Exception cases were defined as instances when users 
requested help from librarians in finding information. If users ^ 
did not seek assistance, it was assumed that their search vocaou- 
larierand the vocabulary of the LCSH coincided. Several further assumptions were 

made, including that: , ; . . 

1. ' small libraries are likely to use LC cataloging as is. 

2. The LCSH wi ll be the basis from which most smaller 

libraries will initiate online public catalogs or 
modified versions of such catalogs as they endeavor to 
increase subject access. In fact, one of the major 
weaknesses of current online catalogs is that they are 
too frequently merely automated versions' of Library of 
Congress . catalog cards. 

The pilot was conducted in three public libraries in 
Pittsylvania County, Danville and Roanoke, Virginia. They were 
selected as the test site, since their users encompass a^ 
heterogeneous population, one that cuts across demographic and 
socioeconomic indicators with Pittsylvania, Danville and Roanoke 
found in ascending order in most instances. The. three, vapy from 
small to medium in size with the highest budget significantly 
less than one million dollars; they were located ^n rural, town 
and urban settings with differing population profiles. Table 1 
provides an overview of the sites. 

The libraries within the three communities were equally 
varied as Table II attests. 

Data were collected over a period of two weeks on a time 
sampling basis in October, 1983 and again in January, 1984. All 
questions recorded for analysis were exception cases. No known 
item searches were included. If, upon perusing the LCSH, a 
variation of the term sought was found in the catalog, that 
search was denoted as a match (M) between the user's vocabulary 
and the LCSH. For example, if information requested on diets was 
found under DIETING, it was tabulated as an M. When, however. 



Table I. Profile of the Virginia Test SItesS 



Library 



Median Years 
Population of School Corn- 

Density Median pleted by Per- 
Population Per Square Kousehold sons 25 Years 
Served , Mile Served Income and Over 



Percent of Per- Percent of Per- 
sons 25 Years sons 25 Years 
and Over Who and Over Who 
Art High School Are Collegu 
Gr actuates Graduates 



Pittsylvania 
County Public 

Library 66.147 66 S14,020 

DanvMle 
Public 

^ibrjiry , ..^. : . . 45.642 ... 268 _ . 13.413 

Roanoke 
County Public 

Library 72.945 290 20.205 



10.4 



11.7 



12.6 



^37.8 



47.5 



70.0 



5.7 



10,5_ 



17.7 



Library 



Pittsylvania 

Danville 

Roanoke 



Table II, Profile of the Three Libraries In the Pilot Study^ 



Full-T' 

Equlv, 

Sta'* 



10,75 

16.7 

33.5 



*ofess1ona1 Total 
Positions Budget* 



2 
5 
8 



Volumes* 
Circulation* Held 



Circulation* 
Per Volume 



$184,115 
$339,966 
$643,778 



^^ures are taken from the 1982 fiscal year. 



105,942 . 47,877 
193,019 83,918 
654,685 - 209,326 



2.2 
2.3 
3.1 



/ 
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users' terms and LCSH lacked -similar nomenclature, that search 

was denoted as no match (N). Fo^ «^""^P^®^^L^n^c'''"^^2\^.n 
requested on Gun Control was found under FIREARMS - LAW AND 
LEGISLATION, it was calculated as an N. Finally, data were 
compiled on exception cases, designated N, whieh required the 
formulation of a sophistication search strategy (S). For example, 
a request for a book on Song W^riting, located under a SEE 
reference pointing to the LCSH Music, Popular Writing, required 
no strategy. However, infor-matioh sought on the effect ot 
discontinuing food subsidies at a" local hospital on the 
nutritiousness of student nurses' diets required a "^^^e ^^P^fJ^ 
approach and was denoted as an S. Table III supplies the results 
of the subject searches in the exception cases. 

There were 412 total exception cases recoi^^l. In 170, or 
41%, of them there was a match between LCSH ^nd t\e users 
vocabulary and in 242, or 5 %, there was no sue match. Danville 
and Pittsylvania County had their highest tallies among the 
exception cases in the no match category. For them the next 
highest figures were matches and the third were questions 
requiring search strategies. In Roanoke, however, questions 
requiring search strategies came in first, no match second and 
matches third. In Roanoke, "then,- the information-requeated 
required a higher propcrticning of sophisticated sea^"\ 
strategies, or simply, more sophisticated users asked more 
sophisticated questions. 



Table III. Subject Search Analysis 



Libraries 


Total 

Questions 

Asked 


Match 

Between LCSH 
and User 
Vocabulary 


No Match 
Between LCSH 
and User 
Vocabulary 


Pittsylvania 


56 
.14 


20 
.36 


36 
.64 


Danville 


316 
.76 


142 ^ 
.45 


174 
.55 


Roanoke 


40 
.10 


8 
.20 


32 
.80 


Total 


412 
1.00 


170 
.41 


242 
.59 



Questions 
Requiring 
Search 
Strategies 



2 
.06 

12. 
.07 

21 
.66 

35 
.14 
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While data must be collected ovtjr a longer period of time 
and on a national basis for any broad generalizations to be made 
with assuredness, this study did confirm suspicions that, at 
least in the three ease sites, LCSH and users' vocabularies were 
sufficiently at -yariance to initiate the development of a system 
which would enh?,nce subject access. As a result, the subset of 
holdinqs denoted as exception cases were identified for enhanced 
access* in the system design phase, since sUch cases pointed up 
subject areas useful to current library users but difficult for 
them to retrieve. 

ENHANCING SUBJECT ACCESS 

The objective of the system developed to enhance subject 
access is to integrate LCSH with users' search vocabularies as 
well as with the search vocabularies of other sources, ouch as 
the Reader's Guide, and to create a hierarchy of the resulting 
descriptors. A thesaurus/bibliography was developed and stored at 
the Pittsylvania County Public Library on an Apple II 
microcomputer with two 5-1/4" floppy disk drives. The software is 
PFS: Files by Software Publishing Corporation, which, permits the 
format of the files to be established, then input, search/edit, 
"print and delete modes to be activated.. The hardware ^nd software 
are currently available for approximately $2, 200. 

The data structure developed. to meet the objective was 
conceptualized as containing eijght fields, electronically 
displayed as «hown in Figure 1. 



ERIC 



FIRST SCREEN 



SECOND SCREEN 



SUflJjECT: 



ENTER: 



LC: 

BROADER SUBJECTS: 
NARROWER SUBJECTS! 
RELATED SUBJECTS: 
SCOPE NOTES; 

BIBLIOGRAPHJ: 



Figure 1. 



Screen Display of the Thesaurus/ 
Bibliography Structure 

8 
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The SUBJECTS field, the primary access point, contains words or 
phr ases tak en from the vocabularies of users when they submitted 
requests for information; To determine the words and phrases to 
include, actual reference questions were collected and analyzed. 
As an example, the descriptor CHILD ABUSE was located in the 
subject field when it was determined that users were more apt to 
employ it than the Library of Congress headings CHILDREN tCARP 
AND HYGIENE) or CHILD PSYCHOLOGY. 

Data in other fields help users find the specific mr»,terials 
they are seeking. The ENTER S fiield instructs the user to register 
another term in orfiter to retrieve a record containing a 
bibliography. Thif-field is also used as a cross-referencing 
device, linking a) term W used as a primary access point to one 
that is. Thfe term^may be another word commonly used by patrons, 
as MADD for Mothers Against Drunk Driving. The 
the user in another way. If the word or phrase in the SUBJECT: 
field represents a complex idea or concept, ^the user is 
instructed ?o scan the BROADER, NARROWER and RELATED SUBJECl^: 
fields in which several other terms are offered. The user may 
then enter these terihs in order to find the bibliography that 
will have the most appropriate materials 

A sel<^'::.ion of Library of Congress Subject Headings ^that 
most nearly match the commonly used term are entered in the LC: 
tield. The intent is to link the user's language with subject 
headings found in the library's card catalog, that is, to connect 
the user's nattiral language and the LCSH. This also oakes it 
possible for tU user to ?eturn to the card catalog and find more 
information, if it is desired. 

To fluaaest the hierarchical structure of the subject 
organlLao^r the next three fields, BROADER SUBJECT:, NA^gWER^ 
SUBJECT: and RELATED SUBJECTS : are developed. They also lead tne 

mo^e S^terlal on the topic. The BROADER SUBJECT: field 
transfers the user to concepts that are more general tnan those 
in the sSb^eSt : field. The user Say reenter one of these concepts 
ihen showJnrrth the symlx.1 /X// ae the broader subject WASTE 
under WASTE DISPOSAL. The words and phrases in the NARROWER 
sSBJECTVprovide an opposite function, indicating ideas embodied 
In thd --te?ms in the SUBJECT : field, such as the narrower subject 
HERPES under VENERAL DISEASE. 

The RELATED SUBJECTS ; field is used to link terms with 
si^ular meanings to the t erms in the SUBJECT: field. These terms 
can be suggested by the wide variety of subject "9^^;^"^^^, 
Shich material on the topic is found. In the case of ^ "HYPNOSIS 
information is found under "BEHAVIORAL MODIFICATION,'' - 
"REINCARNATION" and "BRAINWASHING/' so the user 

other perceptions and usages of the term m the SUBJECT: field. A 
request for myths and legends prompts three separate 
bibliographies in the database: MYTHS, LEGENDS, and MYTHOLOGY. 
All are closely related but not identical, so under each term 
used in the SUBJECT: field, the other two terms appear m the 
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RELATED SUBJECT: field. In addition to defining the meaning of 
the term in the SUBJECT: fieXd, the RELATED SUBJECT: field gives 
an understanding of the scope of the material in the . 
bibliographies. The SCOPE NOTE : field is used to define 
-precisely the meaning of the records found in the SUBJECT: field, 
as in the records under MYTHS, LEGENDS and MYTHOLOGY. 

In the BIBLIOGRAPHY : field the books and periodicals are 
listed that pertain to the term in the SUBJECT: field. Call 
numbers, authors, titles and publishing information are gi.ven. 
precise page numbers for parts of books and periodicals that are 
available in the library, on the shelf or microfiche, are 
included. Reading levels are indicated in some instances. When 
the title does not clearly identify the content of the book or 
article, a brief statement summarizing its content is added, 
similar to the scope notes in CIP data. 

CREATING THE DATA BASE . ^ , 

After conceptualization of the data structure was completed, 
the procedures were set for inputting data. 

THE REFERENCE QUESTION FORM, shown in Figure 2, was designed ^for 
staff members to capture the necessary-^ information' f r(?m users 
requests. _ _ _ as 



REFERENCE QUESTIONS Date_ 



Check the appropriate description of the information seeker: 

Adult Young Adult Child Student Other 

Approximate Reading Level Required: i 

Question as stated by patron:_^^ , . 



Question restated, if necessary: 



General subject area used to\find information; 



Specific sources in which information was found: 



Check here if no information was found: Staff 

Figure 2. Collecting Data for System Input 

10 
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First, the books and periodicals listed under "Specific 
sources in which information was found" are carefully checked for 
the correct bibliographic information. Each source is albo 
reviewed for its .relevance to the term in the SUBJECT: fxeld. 
Using these sources, a clear understanding of the question is 
formulated from which the- cross-referencing structure is 
developed. 

Second, the subject headings to locate the sources in the 
card catalog are traced and each checked further for more 
pertinent material. From the list of subject headings gathered 
and the formal headings used for similar material by the Library 
of congress, several headings are chosen for inclusion in the LC: 
field of the record on the basis of their compatibility with the 
natural language terms of: the user* 

Third, a word or phrase that embodies the intent of the 
user's query is decided upon; as often as possible the user s 
exact wording is employed. With this established, the BROADER, 
NARROWER and RELATED SUBJECT: field structure is built. 
Information about the source material is gathered from the fly- 
leaf, table of con-ents, preforatorial material and CIP data. The 
need for this framework of cross-referencing proceeded from the 
intention to place bibliographies under the most specific term. 
For example, under FROGS whfch is addressing the specific 
question of frog reproduction, EMBRYOLOGY is provided ad the 
broader term wliere explanatory Material can be found. 

Fourth, thk bibliographies are entered. By using the most 
specific term they are kept short and to the point. Material that 
treats the idea or concept in a general manner is located un4er a 
term referred to in the BROADER SUBJECT: field. The form of the 
work is indicated where helpful, -including formats other than the 
book. 

While developing the cross-referencing structure, the whole 
universe of possibilities connected with the question i> not 
considered, rather the subjects and terms are limited to those 
that arise easily from the users' questions. If in the future 
more detail is required, more terms can be added, and the 
structure developed further. As this thesaurus structure emerges, 
it becomes a tool to aid in that development. 

EQUIPMENT AND DESIGN 

Development of the system was undertaken at the Pittsylvania 
County Public Library on an Apple II microcomputer with two 5 - 
1/4" floppy disc drives. The software selected was PPS: Files by 
Software Publishing Corporation, which permits the ^format of the 
files to be established, then allows input, search/exit, print 
and delete mocjies to be activated. 

The software also allows for truncation of terras, so the 
data base can be searched t?y using words, phrases, and parts ci 



11 



11 - Turock 

either when act between double periods (.* .r). Using the 
truncated forms slow.s response tine, since the whole daU base 
will be searched and some irrelevant records retrieved. The 
ability to search with ti^uncated term^ gives the user and the 
designer flexibility. /For example^ users looking^ for' a 
bibliography under MOTHERS "AGAIiMST DRUNK DRIVING (MADD) can ,enter 
the whSle phrase if it is .?emember«^ exactly, or MOTHERS, DRUNK 
DRIVING or parts of ^ither. The hardware and software us*ed in, 
this test system are currehtly available for less than $2,000. 
The test system was design^ for use in the pilot jphase of the 
project only. The extent of the file is 1,000 items o^ 128 
characters. When the system ha^ been tested sufficiently to 
determine the moat desirable fonaat, using a custom written 
program, the data base will be placed on a Wincester disk which 
will chold approximately 33,00Q;^entries. 

LANGUAGE CONVENTIONS , ' • « • - 

Abbreviations are not used ^ re.trievable f ie.rcla, except in 
the case of familiar popular acronyms employed by users, such 
as AA for Alcoholics Anonymous. Pull names are cross-rererenced 
to the abbreviation, so th^t errors in searching caused by the 
user's inability to remember the exact form of the entry are 
avoided. Since users do not'lny^rt proper names in formulating 
their questions, they are notrrhverted in the system. In fact, 
employing the broader and narroWer term structure mak^s inversion 
unnecessary. Plural nouns are used in all subject headings. 
Modifiers ar6 kept to a minimum. Occasionally when a phrase such ^ 
as ••EFFECT OF DIVORCE ON CHILDREN" must be used, they are 
carefully considered. Staff consultations &re held to help 
isolate the wording most likely to be part of the user s 
vocabulary. 

FOR THE FUTURE . > 

The qurrent aystem combines a number of features which the • 
Subject Access Conference touted as ideal, including: 

' 1. A means by which LCSH are integrated with other 
vocabularies arid a means to -switch between them. 

2. A thesaurus screen which offers the user an opportunity 
to do directed ••browsing," while avoiding the inherent ' 

' order of alphabets. 

3. Reduction of false hits because ^he^ thesaurus structure 
puts descriptors into context. 

4. Use of class numbers to guide patrons to better search 
strategies, since the bibliographic components of the 
record contain call numbers of relevant materials. 

5. Inclusion of journal materials in the subject access 
data bases. ^ 
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The initial tests have led to modifications in the system. 
Data- la-st-m calleeted on except ion eases. A Gompan son between 
the success rates of the standard data base and the augmented 
data bass is projected to determine whether we >»ave madeja 
significant impt:ovement in access or merely created another data 
base which is equally difficult to negotiate. 
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